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General Overview 


1-Byzantine resilient C&DH system (fail-operational). 
- Uses triplex onboard computers (OBCs) executing identical flight software. 
- >1FT relies on sparing and crew intervention (e.g. independent backup). 
ms Assumes classical reliability requirement of 10-9 failures/hour. 
gs Realizable with currently available COTS technology.* 
- E.g. Can be implemented using a variety of SBCs and real-time OSs. 
g Scalable fault tolerance (both in classification and quantity). 
- E.g. Through additional network planes, high-integrity devices, etc. 


ws Assumes full cross strapping between OBCs, network switches, and 
end devices/subsystems (e.g. RIUs, IMUs, MBSUs). 


- Minimizes number of 2-fault combinations which can cause system failure. 
- Prioritizes high data availability and architectural flexibility over low SWaP. 


ws Redundant Time-Triggered Ethernet network used for data exchange 
and synchronization between computing platforms. 


- Eliminates need for independent Cross-Channel Data Link (CCDL). 


* This presentation proposes the use of TTTech’s rad-hard space ASIC (available Q3 2017). 
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Different Fault Classifications (there is overlap) 


. Fault Type Description System 
> 
- ; The n not pr n 
ie Fail-Stop enogeeecs 2eup Canes as Cote Failover/Standby 
iH ¢ E.g. Process halts before “send to all”. 
ro) 
= eee The node does not produce any output. 
¢ Can remain undetected by good nodes. 
‘ Omission Follows algorithm, but messages are lost. 

= N-Modular 

' : Value Node produces incorrect computation result. Redundancy 

1 | synchronized majorit 

ee anrain g Outputs are delivered too early or too late. (sy voting nea : 
. Ino ¢ |.e. Node does not meet temporal specifications. 
© ae 
> ; Peers see the fault manifest in the same way. 
D symmetric ¢ E.g. Node send arbitrary data to all or nobody. 
oO I 
Oo 
= 


v Bante Peers see the fault manifest in different ways. Byzantine 
y ¢ E.g. Node sends different data to different peers. Agreement 
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gs Where does byzantine tolerance matter? Agreeing on input data 
- Problem: single source (internal or external) distribution to multiple receivers. 


- In our case, the input seen by each redundant processor must be bitwise 
identical — i.e. have interactive consistency. 


- Why? If all processors get the same input, then all non-faulty processors are 
guaranteed to produce identical output. 
> Can be used to ID faulty processors and resolve commands sent by the OBCs. 
m Consensus versus Correctness 
- Afaulty input device may provide arbitrary input data to the OBCs. 


- The purpose is to guarantee all OBCs have the same view of the system, and 
can therefore decide on the same input value. 
> |.e. the IC exchange guarantees consensus, but not that the input is “correct”. 


- If an accurate input value is important, you need redundant input devices. 


g Avoiding hardware shortcuts 
- It is tempting to try circumventing the problem through increased connectivity. 
> E.g. Trying to ensure all OBCs read some input data from the same shared wire. 


- However, a faulty device may transmit a marginal signal that may be interpreted 
as different values by different OBCs. 
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Rules for Interactive Consistency 


gs What is an interstage? 


- An interstage is an FCR that participates in the interactive consistency 
exchange, but does not require consensus. 


- The purpose of an interstage is to provide the necessary functionality to 
perform byzantine agreement algorithms without requiring all FCRs to 
be full processors. 

s Rules for interactive consistency in 1FT voting systems: 

- Requires 2 3(1) + 1 = 4 Fault Containment Regions (FCRs). 

- Each interstage must receive data through 21 disjoint paths. 

- Devices requiring consensus get data from 2 2(1) + 1 = 3 disjoint paths. 

- Above must be satisfied in (1) + 1 = 2 rounds of data exchange. 


- After data exchange, devices requiring consensus perform an absolute 
majority vote of received messages. 
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o General Overview Cross-Channel Data Link (CCDL) 
° A 1 FT design can be Redundant external 
realized with either: timing reference 
1. 4 full processors/OBCs 
2. 3 OBCs + 1 interstage 


End devices are networked directly 
to one of the OBCs via a bus. 


CCDL Interface 
Fully channelized design — 
Each OBC has access only 


Interstage 
to devices on its own local bus. oe 


Requires independent CCDL for data exchange 
and synchronization (or an external reference). 


CCDL Interface 


Bus Interface 


CCDL Interface 


OBC3 


Bus Interface 
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Classical Approach — Channelized Bus asa (nes 


a Cross-Channel Data Link (CCDL) 


Redundant external 
timing reference 


CCDL Interface 


OBC3 


Bus Interface 


CCDL Interface 


OBC2 


Bus Interface 


CCDL Interface 


Bus Interface 


CCDL Interface 


Interstage 


s Meeting Requirements Ze 


2 3(1) + 1 FCRs? Yes - each OBC/interstage 
+ its CCDL links (4 FCRs total). 


2 2(1) + 1 disjoint paths b/w FCRs? Yes 


(1) + 1 rounds of data exchange? Yes — 
performed in succession over the CCDL. 


(4 - 1) + 4(4 - 1) = 15 msgs per exchange. 
gs Examples 
NASA X-38, LM X-33, NASA Ares I, ULA Delta IV. 


9 jeuueyy sng 
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Certain BFT SM requirements are not fully realizable: 
1. Anon-faulty OBC’s signature cannot be forged. 

« Requires 260-bit signatures — computationally expensive. 
Il. Any alteration of a message can be detected. 


* Schrodinger’s CRC — a single stuck-at-1/2 bit can result in 
different messages that look “correct” to multiple receivers. 


Cryptography in flight control systems 


| ee i = 
represents a different set of priorities. CCDL Interface CCDL Interface CCDL Interface 


¢ Low SWaP > Long term reliability. 


OBC1 OBC2 OBC3 


Bus Interface Bus Interface Bus Interface 


gs Detect lying using authentication 
- Many launcher applications relax the 
requirement for 4 FCRs by using the idea 
of “unforgeable” signed messages. 
> Insufficient reliability for long mission Judie, 4 
ws Relaxed Requirements 


- 22(1) + 1=3 FCRs - each OBC + links. Ff 
- 2(1) +1 = 2 disjoint paths between FCRs. rf 
- (3-1) +3(3- 1) = 8 messages per exchange. Hi 
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g Step 1: Read data 


OBCs 1-3 reads data 
from local input device. 


> No guarantee data agrees. 


oO 

OBC1 OBC2 OBC3 
q 
o 
a 
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Channelized Bus — Reading Data (2) 


| Cross-Channel Data Link (CCDL) 


gw Step 2: Exchange 


- OBCs 1-3 send their initial | 


values to OBCs 1-3 + interstage. 


> An OBC may “lie” arbitrarily to its 
peers (results in an asymmetric view). 


CCDL Interface wy 


o8ce 


OBC3 


| Bus Interface Bus Interface Bus Interface 
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CCDL Interface 


OBC3 


Bus Interface 


gm Step 3: Exchange (Rd 2) 


- OBCs 1-3 + interstage send round 1 data 
to all OBCs 1-3 (round 2). 


> Still, any FCR 1-4 could fail asymmetrically. 
q 


OBC1 


OBC2 
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Both non-faulty OBCs 
now share the same 
view of the system. 


OBC1 OBC3 


Bus Interface Bus Interface 


gw Siep 4: Create symmetry 


- OBCs 1-3 performs majority voting of round 
2 data to “correct” round 1 data (non-faulty 
OBCs now share the same IC vector). 
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Channelized Bus — Reading Data 


qo 
All non-faulty OBCs will 
select the same input data. 
qo 
OBC1 OBC2 OBC3 
i) 


gw Step 5: Make a decision 


- OBCs 1-3 execute a choice() function to 
select a final value (e.g. median, mean). 


Approved for Public Release — 


Andrew Loveless (NASA JSC/EV2) NOE cormeonreledinaa 


Slide: 14/56 


©) at=Talat=)|y4=\e Mm = 10 \opreem Oxo) palant-lareliare 


g Step 1: Prepare Command 


After computation, 
OBCs 1-3 each 
generate a command. 


> All non-faulty OBCs agree. 


a 
[cor merace RD 
OBC1 OBC2 |x OBC3 
Oo 
a 
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gw Step 2: Exchange 


- OBCs 1-3 send their output 
values to OBCs 1-3. 


>» Again, an OBC may “lie” arbitrarily to 
its peers (results in an asymmetric view). ae 
> This behavior is tolerated, since the non-faulty 


OBCs do not need to have consensus on 
the entire view of the system. 


CCDL Interface 


Bus Interface 


CCDL Interface wy 


Bus Interface 


OBC3 
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2) Non-faulty OBCs can 
identify other faulty 

FCRs — but view may 
not be consistent!. 


OBC1 OBC2 OBC3 


Bus Interface Bus Interface Bus Interface 


gs Step 3: Majority Vote 
- Each OBCs 1-3 performs a majority vote 
to correct its initial output value. 


>» .Process can be used to detect faulty OBCs and 
initiate fault recovery or system reconfiguration. 
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CCDL Interface 
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CCDL Interface CS CCDL Interface 
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Bus Interface 


gs Step 4: Transmit Command 


OBCs 1-3 send the command to the 
output device connected to their local bus. 
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Channelized Bus — Detailed Exchange 


Information Exchange — Round 1 


The value received } 
from OBC2 is 
different for each 
non-faulty FCR. 
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Channelized Bus — Detailed Exchange 


Information Exchange — Round 2 1 2 3 
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Channelized Bus — Detailed Exchange vasa ("es 


Create Symmetry - Majority Voting Making a Decision 


On OBCs 1-3, each element in the interactive On OBCs 1-3, a choice() function is 
consistency (IC) vector is set to the strict majority used to determine a final value from 
of its children. those contained in the IC vector. 

> |.e. OBCs 1,3 must agree on data from OBC 2. > E.g. a mid-value selection. 


96 60. All non-faulty OBCs 


now agree on the 
data originally sent 
by OBCs 1-3. 
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In a channelized system, 
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? 
etter y 
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1 remediating OBCs 1-3 is 
the same as remediating 


BG ¥4 | 604 
OBC3 | 56] [QS 5 |] 5 IMUs 1-3 (the devices). 
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gs “Flattening’” the classical two-round exchange 
- Can be analyzed as messaging over redundant paths (from different FCRs). 
- Makes it easier to see why 4 FCRs and 3 disjoint paths are necessary. 
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IC Exchange — Alternate Viewpoint (2) as hes 


gs “Flattening’” the classical two-round exchange 
- Can be analyzed as messaging over redundant paths (from different FCRs). 
- Makes it easier to see why 4 FCRs and 3 disjoint paths are necessary. 


For simplicity, not all Round 2 messages are shown. 


Approved for Public Release — 


Andrew Loveless (NASA JSC/EV2) Rove romeantoledoaia 


Slide: 23/56 


Generalizing use of interstages (1) 


Example 1: 

¢ Four total FCRs 

¢ Two interstages 

¢ Two devices require consensus 


Round 1 


Rules for IC in 1FT voting systems: 
Requires 2 3(1)+ 1=4 FCRs. 
Interstages need data from 21 paths. 
Devices requiring consensus need data 
from 2 2(1) + 1 = 3 disjoint paths. 

Two rounds of data exchange. 


Devices requiring consensus perform 5, X, 5 3 paths 5, Y, 5 
majority vote over received messages. Final: 5 9 Final: 5 


2 paths 


© Device requiring consensus 


Round 2 


[_] Interstage (does not require consensus) Y 
' Designates originating device 5 
™) Designates faulty device 


Assumption: Any device may fail arbitrarily 
(omission, symmetric, asymmetric, byzantine). 
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Example 2: 
¢ Five total FCRs 
¢ Three interstages 


Rules for IC in 1FT voting systems: 
Requires 2 3(1)+ 1=4 FCRs. 
Interstages need data from 21 paths. 
Devices requiring consensus need data 
from 2 2(1) + 1 = 3 disjoint paths. 
Two rounds of data exchange. 


Devices requiring consensus perform 
majority vote over received messages. 


© Device requiring consensus 


[_] Interstage (does not require consensus) 
' Designates originating device 
™) Designates faulty device 


Assumption: Any device may fail arbitrarily 
(omission, symmetric, asymmetric, byzantine). 
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Two devices require consensus 
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High-Integrity Devices in TT Ethernet NASA Aes 


Oo nn Integrity Design 
Command/Monitor (COM/MON) design aims 
for error containment within the device. 


- Contains two fault containment regions. COM MON 
Input is forwarded to both COM and MON. ae 


Congruency exchange ensures both COM and 
MON have identical input data (i.e. IC). a 
Both COM and MON process data in parallel. 

Output from COM is forwarded to MON. 


If disagreement, MON terminates the transmission. 


gs Device Failure Assumptions | 
Note that a transmitted 


Standard devices may be subject to byzantine failures. 'N OUT | message may be 

- Device may send arbitrary messages (of any contents). HRESISO = Me TeCower 
, _ rejects the message. 

- Device may transmit messages at arbitrary points in time. 

- Device may send different messages through different network planes (channels). 

High-Integrity devices may be subject to inconsistent omission failures. 

- Faulty device will not create (nor modify existing to produce) a new valid message. 

- Device may drop or fail to receive an arbitrary number of messages. 

- Device may fail to relay messages asymmetrically — some receivers may not get data. 
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Rules for Interactive Consistency 


gs What is an interstage? 

- An interstage is an FCR that participates in the interactive consistency 
exchange, but does not require consensus. 

- The purpose of an interstage is to provide the necessary functionality to 
perform byzantine agreement algorithms without requiring all FCRs to 
be full processors. 

sw Rules for interactive consistency in 1FT voting systems: 
- Requires 2 3(1) + 1 = 4 Fault Containment Regions (FCRs). 

- Each interstage must receive data through 21 disjoint paths. 

- Devices which require consensus must get data from: 

|. 22(1) +1 =3 standard-integrity devices, or 

ll. 2(1) +1 = 2 high-integrity devices, or 

I. ACombination of the above 

- Above must be satisfied in (1) + 1 = 2 rounds of data exchange. 


- After data exchange, devices requiring consensus perform an absolute 
majority vote of received messages. 
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Generalizing use of (HI) interstages 


Example 3: 

¢ Six total FCRs 

¢ Two HI interstages 

¢ Two devices require consensus 


Round 1 5 ‘ 
Rules for IC in 1FT voting systems: 
* Requires = 3(1)+ 1=4 FCRs. 1 path pau 
¢ Interstages need data from 21 paths. 
Devices requiring consensus need data: 
l. from 22(1) + 1=3LI devices 
Il. from from 2 (1) + 1 = 2 HI devices 
Ill. from a combination of the above 
Two rounds of data exchange. 5,5 5 
Majority vote used to reach consensus. Final: 5 Final: 5 


2 paths 


© Device requiring consensus 

[_] LI Interstage (does not require consensus) Round 2 
HI interstage (does not require consensus) 

'_ Designates originating device 

!) Designates faulty device 


Assumption: LI devices may fail arbitrarily, HI 
devices may fail via inconsistent omission. 
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Generalizing use of (HI) interstages 


Example 4: 

¢ Six total FCRs 

¢ One HI + two LI interstages 

¢ Two devices require consensus 


Round 1 
Rules for IC in 1FT voting systems: 


¢ Requires 2 3(1)+ 1=4 FCRs. 

¢ Interstages need data from 21 paths. 

¢ Devices requiring consensus need data: 
l. from 22(1) + 1=3LI devices 
Il. from from 2 (1) + 1 = 2 HI devices 
Ill. from a combination of the above 


Two rounds of data exchange. 5,5 5.2,5 
Majority vote used to reach consensus. Final: 5 Final: 5 


3 paths 3 paths 


© Device requiring consensus 

[_] LI Interstage (does not require consensus) Round 2 
HI interstage (does not require consensus) 

| Designates originating device 

©) Designates faulty device 


5 


Assumption: LI devices may fail arbitrarily, HI 
devices may fail via inconsistent omission. 
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gs General Overview povverccoseecccaseeccnans 1 cama 7 


Scalable 1FT design can be realized with: | OBC1 i | OBC2 

- 3 full processors/OBCs ! - 

- 2-3 redundant network planes (interstages). | - 
- Majority voting of redundant messages. bo 

Fully-cross strapped design — each OBC 
has access to any networked device. 


Time-Triggered Ethernet network provides 
data distribution and synchronization 
between platforms. 


- Does not require separate CCDL or 
timing/synchronization hardware. 


Triplex OBCs do not directly interface to 
any end devices (insulated by network). 


OBC3 


COTS SBC 


OUT2 OUT3 
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OBC1 


COTS SBC 


gw Device Characteristics 
COM/MON switches, standard integrity ESs. 
Error Containment Unit b/w switch ingress/egress. 
Switches provide 1FT or 2FT availability LZ 
depending on number of channels. aa 


COM/MON switches required as trusted 
Compression Masters (CM) for sync. 


HI switches cannot protect against valid 
frames containing erroneous data 
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owitched Triplex 


Required redundant channels 


A 1FT configuration requiring only two 
network planes is possible only if switches 
are fully self-checking (fail-silent). 


A restricted failure mode model requires the 
realization of two independent FCRs. 


Inconsistent omission is a reduced model. 
Must eliminate common mode elements: 


- E.g. Shared timer, dielectric isolation, 
physical space, temperature. 

If the switch may fail arbitrarily, then three 

redundant channels are always required. 

In all cases, 3x channels minimizes number 


of two-fault combinations resulting in system 
failure over 2x channels. 


Current Implementation 
TT Tech COM/MON devices share power 
(with separate power monitor). 


A shared oscillator is used for COM/MON, 
with a dedicated clock monitor to prevent 
common mode clock failures. 


Fault-propagation from switches theoretically ! 


requires dual-correlated simultaneous faults. 
> 1e-6 X 1e-6 = ~1e-12 failures/hour 
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Even with extensive self-checking, some fault 
modes could “escape”. We must either: 


1. Prove complete coverage, or 


2. Design the system to tolerate the escape 
of an arbitrary fault (i.e. 3 channels). 
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gm Step 1: Exchange (Round 1) cau © nea ee ; 
- Each redundant input device (any #) ! . ii ! 
transmits its data to switches 1-3. 
> No guarantee non-faulty devices agree. 
> A failed device may transmit arbitrarily. 


| 
| 
te: 
Switches must act as 
guardians to prevent input 
4 device babbling (temporal). 


6 
er 
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Switched Triplex — Reading Data (2) 


opc1 |: ;| oBc2 


cotsssc || } } || coTssBc 


gw Step 2: Exchange (Round 2) 


- Switches 1-3 send each redundant —__j-_. 
input message to all OBCs 1-3. 
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gw Siep 3: Create symmetry 

- OBCs 1-3 performs a majority vote of 
the message copies received from each 
redundant network channel. 


> Messages that violate the protocol are dropped. 


> Majority must be determined according to number 
of messages received (i.e. not static 2/3). 
> Non-faulty OBCs now share the same IC vector. 
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Bitwise majority voting of messages 
received over redundant channels 
can be implemented in the TTE ES 


or driver. The process is identical 
for all incoming TT traffic. 


Note: Voting is over one VL. 
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Data remediation (choice) of 
messages received from multiple 
devices is implemented in the 


application — app specific. 


ae Note: Choi i Itipnle VLs. 
ws Step 4: Make a decision ge ee eae 
- OBCs 1-3 execute a choice() 
function to select a final value from 
the redundant input devices 
(e.g. median, mean). 
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g Step 1: Prepare Command eee a i 
After performing computation, OBCs ! i i 
1-3 each generate a command. 

> All non-faulty OBCs agree on the output. 


TTE ES (NIC) 
oO ae 
! e, “J 
I 6 
o L 
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gw Step 2: Exchange (Round 1) ea 
Each OBC 1-3 transmits its output se 


value to all switches 1-3. ————e 


1 
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gm Step 3: Exchange (Round 2) 


Switches 1-3 send each input message 
to all redundant output devices (any #). 1 lee 


1634 


q Ue iteeeenes df 


6 
he 


2s) OB Nhe OA NG 
OE ee ORs 


Approved for Public Release — 


Andrew Loveless (NASA JSC/EV2) NOE cormeonroledinaa 


Slide: 40/56 


— 
(>) 
= 
E 
C 
4) 
- 
= 
O 
O 
| 
x< 
is 
for 
— 
}— 
[2 
od) 
a 
9 
= 
ep) 


gw Siep 4: Create symmetry 


Each output device performs a majority vote 


of messages received from each channel. 
> This IC exchange is required to ensure consensus 
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be implemented in the 


NIC or driver. 


of multiple output devices in case of one OBC. 
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gw Step 5: Make a decision 


Each output device performs a 
second majority vote over the a 2 ee 2 es BE 
commands from each OBC. 


> l.e. the choice() function for output 
devices is always a bitwise majority. 
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TTE ES (NIC) 


nl 
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) Happening Simultaneously 


g Step 3: Exchange (Round 2) 


Switches 1-3 send each input message es 
“reflected” back to each OBC 1-3. 


L..----------- 2s ss e 


purpose of fault detection and reconfiguration. 
q 
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) Happening Simultaneously 


gw Siep 4: Create symmetry 


Each OBC 1-3 performs a majority vote 
of messages received from each channel. 
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) Happening Simultaneously 


gs Step 5: Identify faulty OBC 


OBCs 1-3 perform a majority vote 
over the commands from each OBC. 
> Identical to action performed by OUT 1-3. 


> Can be used to identify OBCs that do 
not agree with the majority (for FDIR). 
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- When sharing a value between OBCs 
(e.g. output monitoring, shared state), the 
Original sender cannot use its value directly. 
Instead, it performs a majority 
vote of the values reflected back 
from the switches (i.e. IC). 


- This ensures consensus in case 
of an arbitrary transmission error. 


le Tolole eam a t-I-mOrve) ar-yi-y(-laleay =¥- To Reem) (OM Oo) ar-Ji-y(-Valeay 


5, X, X 5, X, X 5, X, X 5 (original) 5, X, X 5, X, X 
Final: X Final: X Final: X Final: 5 Final: X Final: X 


Round 1 


Round 2 
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gs Network-Level IC = no host blocking 


- Consensus between multiple receivers can be achieved 
transparent to the flight software (no impact on CFS). 


- If you read a value, you already know it is the voted answer from 
a two round exchange — consistent across all receivers (1FT). 


- Eliminates classical “acceptance window’ for exchanges. 
- No need for “read, send, wait ... read, send, etc.” 
- Minimizes use of host resources (especially if in NIC). 
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gs |The Role of the Remote Interface Unit (RIU) 


- The RIU acts as a gateway between the TTE network, analog 
devices, and legacy buses (e.g. MIL-STD-1553, ARINC 429). 

- Moves signal conditioning closer to sensor/effectors, reducing 
noise and wiring mass. 

- Functions it may implement include A/D conversion, network 
formatting, range checking, scaling, linearization, and 
threshold/filter services specific to each subsystem. 


- Uses configuration files to map local buffer data to TTE dataports. 
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Approach 1: 
¢ One RIU 
¢ One sensor 


Problems? 

¢ Sensor data sent to 
RIU may be wrong. 

The Fix: 

¢ Add redundant sensors 
and have RIU remediate 
between them. 


© Onboard Flight Computer 
(_) Remote Interface Unit (RIU) 
© Sensor or Actuator 


TTE network switch (COM/MON) 
!) Designates faulty device 
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Approach 2: Subsystem cannot function 


¢ One RIU 
¢ Remediation b/w multiple sensors 


Problems? 

¢ RIU could fail internally, resulting in: 
1. No-transmission 
2. Symmetric faulty transmission 


The Fix: 


¢ Increase resilience of the RIU: 7 
1. TMR of processor elements (e.g. Maxwell 
SCS750 used on ESA Gaia satellite). 
2. True dual-core lock-step processor (i.e. 
fully isolated self-checking). 


* COTS products like ARM Cortex-R4/R5 
not available in rad-tolerant variants. 


Triplex has consensus on: 
1. Non-existent data 
2. Incorrect data 


Not good 
enough? 
Replicate 
the RIU 


= 


© Onboard Flight Computer 
(_) Remote Interface Unit (RIU) 


© Sensor or Actuator 
TTE network switch (COM/MON) 
!) Designates faulty device 
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No Export Controlled Data 


Approach 3: 
¢ One RIU with HI processor 
¢ Remediation b/w multiple sensors 


Problems? 

¢ TTE ES could fail arbitrarily, resulting in: 
1. No-transmission 
2. Symmetric faulty transmission 
3. Byzantine transmission 


The Fix: _ 
¢ Increase resilience of the end system: Not good 
1. TMRin the TTE Chip-IP MAC layer. | enough? 
2. UseaCOM/MON Hi end system. Replicate 
Not available in TTTech Space ASIC. the RIU 
© Onboard Flight Computer 
(_) Remote Interface Unit (RIU) 
© Sensor or Actuator 
TTE network switch (COM/MON) 
!) Designates faulty device 
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Approach 4: 
¢ Multiple RIUs 
¢ Each reads redundant sensors 


Problems? 


¢ None. Any arbitrary failure of an RIU 
is tolerated by the Triplex computers: 
¢ Choice() function is application specific. 


Caveats: 


¢ Each RIU performs only minimal local 
processing (e.g. message packing). 
¢ No consensus is required between RIUs 


before transmitting data. 
¢ Since OBCs make decisions, OBCs 
require the consistency. 


© Onboard Flight Computer 
(_) Remote Interface Unit (RIU) 
© Sensor or Actuator 


TTE network switch (COM/MON) 
!) Designates faulty device 
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Subsystem able to function 


© ® ® @® 
© ®Y®® 
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Approach 5: 

¢ Multiple RIUs 

¢ Each reads redundant sensors 
¢ RIUs require consensus 


Description 
¢ If consensus between RIUs is necessary 
without interacting with the OBCs, then 
IC can be performed between RIUs. 
¢« Uses redundant network channels 
to provide the necessary FCRs. 
¢ Process is similar to classical 
channelized bus voting approach. 
Caveats: 


* Can make architecture much more complex. 
¢ 1FT bus commanding may require 3 RIUs. 


scene achieve 
consensus at 
scene level. 


Subsystem 
able to function 


© Onboard Flight Computer 
(_) Remote Interface Unit (RIU) 
© Sensor or Actuator 


TTE network switch (COM/MON) 
!) Designates faulty device 
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RC frames can 
be generated by 


eee rte Rate-Constrained (A664-p7) 


(Asynchronous critical systems) 
* Traffic shaping and policing ensures 
successful message delivery. 


¢ Provides event-driven communication 
between synchronization domains. 


Best-Effort (IEEE 802.3) 
(Crew interfaces and science) 
* Classical LANs can run isolated from 


or overlapping TT/RC network. IEEE 802.11n 


¢ COTS hardware easily upgraded. Time-Triggered (SAE AS6802) 


(Vehicle Command and Control) 


¢ All messaging is into/out of C&DH system. 
¢ Periodic and generally low bandwidth. 


Cameras, 
Audio, and 
Portable 
Devices 


Effectors 
* Heaters 


Sensor Data 
(High rate) 


< 5 Mbit/s 


. ai * Pumps 

* Optical navigation 1FT C&DH System Gales 
* Autonomous systems if i kisi an 
nterface 


Classical Ethernet LAN Sensor Data 


(Low rate) 

* Star tracker 

* IMU/SIGI fae 

+ Sun sensor ; Distributed 
Onboard * Thrusters <5 Mbit/s Processing 
Displays * Temperature * RIU/DAU 

* Humidity - k 

* Oxygen, CO, Star ua er 

+ Flow rate * Propulsion 
Direct audio/ * Voltage *ECLSS 


video signals 


Onboard 
Gateway 


High Speed Serial 
(P2P, minimal networking) 
« Provides >1Gbit/s point-to-point or 


< 10 Mbit/s 


(possibly) networked messaging. Command/ Transponders (SDR) [1] Rakow, Glenn Spacecraft Crew- 
* Mostly related to off-board communication. DTN Storage/ Telemetry S-band, Ka-band, X- Vehicle Avionics Networks and 
Processing Processing band, Proximity (UHF) Communication Flow 
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RC frames can 
be generated by 
COTS devices 


Rate-Constrained (A664-p7) 
(Asynchronous critical systems) 
* Traffic shaping and policing ensures 
successful message delivery. 
* Provides event-driven communication 
between synchronization domains. 


Data 
Recorders 
DTN Storage/ 
Processing 


Docking Rate-constrained 
Interface traffic can be used 


O by subsystems 
traditionally limited 


Best-Effort (IEEE 802.3) 
(Crew interfaces and science) 
* Classical LANs can run isolated from 


or overlapping TT/RC network. IEEE 802.11n 
¢ COTS hardware easily upgraded. 


Time-Triggered (SAE AS6802) 
(Vehicle Command and Control) 


¢ All messaging is into/out of C&DH system. 
¢ Periodic and generally low bandwidth. 


Cameras, 
Audio, and 
Portable 
Devices 


Effectors 
* Heaters 

* Pumps 

* Valves 

* Motors 


Sensor Data 
(High rate) 


* Optical navigation 
+ Autonomous systems 


<5 Mbit/s 
&DH S' stem 


1FTC 


to P2P comm. 


Sensor Data 
(Low rate) 

Star tracker 

* IMU/SIGI Command/ 

* Sun sensor Distributed Telemetry 

Onboard + Thrusters Processing Processing 


Classical Ethernet LAN 


< 5 Mbit/s 


Displays + Temperature + RIU/DAU 
: coals + Star tracker Equipment 
- Flow aa 2 * Propulsion unique cabling 


Direct audio/ *ECLSS 


video signals 


* Voltage 


Transponders (SDR) 


Onboard 


Gateway S-band, Ka-band, X- 


High Speed Serial band, Proximity (UHF) 
(P2P, minimal networking) 


« Provides >1 Gbit/s point-to-point or ; >) 
(possibly) networked messaging. ee [1] Rakow, Glenn Spacecraft Crew- 
* Mostly related to off-board communication. MPINIBES, SWINGS [ape Vehicle Avionics Networks and 
Communication Flow 
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RC frames can 
be generated by 
COTS devices 


Rate-Constrained (A664-p7) 
(Asynchronous critical systems) 


* Traffic shaping and policing ensures 
successful message delivery. 

* Provides event-driven communication 
between synchronization domains. 


Data 
Recorders 
DTN Storage/ 
Processing 
Docking Voting at 
Interface Interface 


Best-Effort (IEEE 802.3) 
(Crew interfaces and science) 
* Classical LANs can run isolated from 


or overlapping TT/RC network. IEEE 802.11n 


* COTS hardware easily upgraded. Time-Triggered (SAE AS6802) 


(Vehicle Command and Control) 


¢ All messaging is into/out of C&DH system. 
¢ Periodic and generally low bandwidth. 


Cameras, 
Audio, and 
Portable 
Devices 


Effectors 
* Heaters 
* Pumps 


Sensor Data 
(High rate) 


* Optical navigation 
+ Autonomous systems 


<5 Mbit/s 
&DH S' stem 


1FTC 


Classical Ethernet LAN Sensor Data 


(Low rate) 

+ Star tracker 

+ IMU/SIGI \ Command/ 

+ Sun sensor ; Distributed Telemetry 
Onboard * Thrusters <5 Mbit/s Processing Processing 
Displays * Temperature * RIU/DAU 

* Humidity 7 : 

* Oxygen, CO, Star meek Equipment 

+ Flow rate * Propulsion unique cabling 
Direct audio/ * Voltage *ECLSS 


video signals Transponders (SDR) 


S-band, Ka-band, X- 


Onboard 
Gateway 


band, Proximity (UHF) 


High Speed Serial 
(P2P, minimal networking) a 
« Provides >1 Gbit/s point-to-point or ; >) 
(possibly) networked messaging. * dete oles [1] Rakow, Glenn Spacecraft Crew- 
* Mostly related to off-board communication. MPIIBES, SWINGS [ap Vehicle Avionics Networks and 
Communication Flow 


Approved for Public Release — 


Andrew Loveless (NASA JSC/EV2) NOE cormeonreledlnaa 


Slide: 56/56 


Questions? 
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